Systems for still-to-video face recognition (FR) seek to detect the presence of target individuals based on\udreference facial still images or mug-shots. These systems encounter several challenges in video surveillance\udapplications due to variations in capture conditions (e.g., pose, scale, illumination, blur and expression) and\udto camera inter-operability. Beyond these issues, few reference stills are available during enrollment to\uddesign representative facial models of target individuals. Systems for still-to-video FR must therefore rely on\udadaptation, multiple face representation, or synthetic generation of reference stills to enhance the intra-class\udvariability of face models. Moreover, many FR systems only match high quality faces captured in video,\udwhich further reduces the probability of detecting target individuals. Instead of matching faces captured\udthrough segmentation to reference stills, this paper exploits Adaptive Appearance Model Tracking (AAMT) to\udgradually learn a track-face-model for each individual appearing in the scene. The Sequential Karhunen–\udLoeve technique is used for online learning of these track-face-models within a particle filter-based face\udtracker. Meanwhile, these models are matched over successive frames against the reference still images of\udeach target individual enrolled to the system, and then matching scores are accumulated over several frames\udfor robust spatiotemporal recognition. A target individual is recognized if scores accumulated for a trackface-model\udover a fixed time surpass some decision threshold. The main advantage of AAMT over traditional\udstill-to-video FR systems is the greater diversity of facial representation that may be captured during\udoperations, and this can lead to better discrimination for spatiotemporal recognition. Compared to state-ofthe-art\udadaptive biometric systems, the proposed method selects facial captures to update an individual's\udface model more reliably because it relies on information from tracking. Simulation results obtained with the\udChokepoint video dataset indicate that the proposed method provides a significantly higher level of\udperformance compared state-of-the-art systems when a single reference still per individual is available for\udmatching. This higher level of performance is achieved when the diverse facial appearances that are\udcaptured in video through AAMT correspond to that of reference stills.
展开▼